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Protein distributions measured under a broad set of conditions in bacteria and yeast were shown 
to exhibit a common skewed shape, with variances depending quadratically on means. For bacteria 
these properties were reproduced by temporal measurements of protein content, showing accumu¬ 
lation and division across generations. Here we present a stochastic growth-and-division model 
with feedback which captures these observed properties. The limiting copy number distribution 
is calculated exactly, and a single parameter is found to determine the distribution shape and the 
variance-to-mean relation. Estimating this parameter from bacterial temporal data reproduces the 
measured distribution shape with high accuracy, and leads to predictions for future experiments. 


INTRODUCTION 

The phenotype of a biological cell — in particular, the 
types and copy numbers of its expressed proteins — fluc¬ 
tuates from cell to cell, even among those whose geno¬ 
types and growth environments are identical (reviewed 
in W)- Protein content depends on a complex in¬ 
terplay of genetic, epigenetic and metabolic processes, 
with numerous cell-specific regulatory mechanisms and 
feedback loops. However, recent experiments [4] have 
demonstrated that for two different types of microorgan¬ 
ism (yeast and bacteria), each under a broad range of 
conditions, the distribution of highly expressed protein 
copy number appears universal: under rescaling by mean 
and standard deviation, all such distributions collapse 
onto a single skewed curve [5]. In the same experiments 
variances were found to depend quadratically on their 
means, a trend displayed also by all highly expressed pro¬ 
teins in E. coli in a genome-wide study [6] (see also HD- 

A recent study following the protein content in indi¬ 
vidual E. coli bacteria over roughly 70 generations has 
revealed that, under the same scaling criteria, the shape 
of the distribution of protein copy number sampled over 
time in an individual converges to the one observed in 
large populations [8]. While analogous temporal data 
are currently unavailable for yeast, this is an important 


property that reflects the ergodicity of the relative fluc¬ 
tuations in protein expression in bacteria. These results 
can serve as a basis for constructing a model relating bac¬ 
terial temporal protein dynamics to their distributions. 


MODEL WITHOUT FEEDBACK 

Given that the universal statistical properties de¬ 
scribed above were found for a range of experimental con¬ 
ditions for various proteins in bacteria [4], such a model 
should rely only on general coarse-grained processes. We 
therefore start by assuming as little as possible given the 
experimental data: 

• Protein number increases as during the gen¬ 
eration, where the exponential growth rate ki fluc¬ 
tuates with i [8]. 

• The time Ti of the generation (i.e., the time 
between cell division at the (i — 1)®^ generation and 
that at the is also random [Ms]. 

• The product Xi = kiTi is a random variable, with 
(positive) mean fi and variance . We will refer to 
Xi as the accumulation exponent. 
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• Protein number is conserved at cell division, and 
protein degradation is much slower than a typical 
interdivision time El- 

Let Ni denote the copy number of a given type of pro¬ 
tein in the cell and fi the copy number ratio between 
the daughter and parent cells, both at the end of the 
generation. Incorporating the features listed above gives 
rise to the recursion relation 


= fiNi exp (Xi+i). (1) 

In bacteria fi is narrowly distributed about 1/2 [15]. We 
take fi = 1/2 for now and discuss deviations from this 
assumption later. The solution of 0 for arbitrary gen¬ 
eration number n is 

n 

7V„ = 2-”7Voexp(^x/ (2) 

j = l 

where Nq is the initial copy number. Then 

n 

\n{Nn/No) = -n In 2 + ^ Xj (3) 

i=i 


The mean of Xj should compensate for the decrease in 
protein number in the daughter cells caused by division; 
otherwise copy numbers would be unstable, running off 
to unsustainably large numbers or falling to zero within 
a few generations. Eq. 0 can be rewritten as 


\n.{Nn/N(f) — n{ii— hi2) I^/ncr = — n/jL j\fnG . 


i=i 


( 4 ) 

If the accumulation exponents Xj are independent, the 
central limit theorem gives 


possibly other biological factors [3 US]. The stationary 
distribution shape must therefore be independent of N. 

A natural extension of the growth-and-division model 
consistent with observations is the introduction of an ac¬ 
cumulation exponent that is negatively correlated with 
protein number at the start of the cycle. The experimen¬ 
tal requirement of universality constrains the form of the 
feedback term: a change in scale of X cannot alter the 
functional form of the recursion relation. The only func¬ 
tion with this property of scale invariance is the power 
law; the modified recursion relation is therefore 

A^i+i = /iA^i[exp(^i+i)](Ari/]V)-“ (6) 

with fi defined as before, (which we will call the 
residual accumulation exponent) the component of the 
accumulation that fluctuates independently from gener¬ 
ation to generation, and a new phenomenological param¬ 
eter 0<(a<l;(a = 0is the case without feedback El- 
The recursion relation of the modified model is 

In = \nfi (1 — a) In Ni a In N. (7) 

It is not hard to check that first, there is now a limiting 
stationary distribution, with (N) ^ N; and second, that 
N can be scaled out of the growth equations. 

The introduction of a nonzero a makes the specific 
form of the limiting distribution dependent (though not 
too sensitively; see below) on the distributions of fi and 
f^i. Experiments on bacteria indicate that f^i is approxi¬ 
mately normally distributed (see Eig. 2b). Using this, we 
can solve for the limiting distribution exactly when the 
division ratio is fixed. The limiting distribution is again 
lognormal: 


ln(Xn/Xo) - n(/r - ln2) /^/na ^ A/'(0,1), (5) 


that is, the LHS converges in distribution to the normal 
distribution with mean zero and variance one. There 
is no stationary distribution for this process; the mean 
and variance of Nn vary with time (or n), even when 
/i = In 2 exactly. This conclusion holds independently of 
the various distributions used; all that matters is that 
fluctuations are independent between generations. This 
analysis demonstrates that a stationary distribution, as 
experimentally observed, can result only if some negative 
feedback is present. Given this, we next introduce and 
analyze a modified model with effective feedback regulat¬ 
ing protein accumulation, and following that we discuss 
its experimental justification and consequences. 


MODEL WITH FEEDBACK 

A given protein type in an individual cell has a well- 
defined typical copy number N. Its value is nonuniver- 
sal, depending on protein type, growth conditions and 


P{N) 



X exp 


( 


InX- 



2E2 


( 8 ) 


with A4 = InX -f (/i — ln2)/a and E = a j\/2a — a^. 
These two parameters together determine the mean and 
variance of the distribution: specifically, {N) = exp{Af-l- 
EV2}, and (X^) - (X)^ = (e^' - 1) exp{2X1 -h E^}. 
However, only E determines the shape of the distribution; 
different values of Xl collapse on one another following 
scaling by a linear transformation. Moreover, for fixed E 
the variance scales quadratically with the mean. 

In terms of the model, Eq. Q has several important 
features. Eirst, it preserves universality under scaling 
with respect to all variables that appear only in the pa¬ 
rameter XI, because the distribution shape is indepen¬ 
dent of this additive term. Consequently, all values of X, 
/i, and division ratio yield the same distribution shape. 

Second, as noted above a single composite parameter 
E determines the shape of the distribution. E charac¬ 
terizes the balance between the variance of accumulation 
exponents, which tends to drive the process to diverge. 
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and the effective feedback parameter a, which provides a 
“restoring force”. Once S is determined, both properties 
- collapse of scaled distributions and quadratic depen¬ 
dence of variance on mean - are preserved. 

Third, it should be noted that in the setting of our 
model, the limiting steady state distribution equally well 
represents the time average over many generations of a 
single individual or the average at a single large time over 
a large population in which the individuals are evolving 
independently. 

The analysis above assumed a Gaussian distribution 
of the accumulation exponents and a fixed value of the 
division ratio. We now explore the robustness of our 
conclusions. Without these assumptions, the lognormal 
solution will no longer be exact, but will not be signifi¬ 
cantly altered for a variety of unimodal distributions for 
both variables. Moreover, the scaling properties within 
classes characterized by S (defined as before and which 
is now close to but not exactly equal to the standard 
deviation of InA^) still hold. Fig. la shows examples of 
means and variances computed from many simulations, 
in which the /^’s were drawn from a Gaussian distribu¬ 
tion and the ^^’s from a gamma distribution. For each 
simulation, a and N were chosen randomly, and the vari¬ 
ance of the gamma distribution was adjusted to give 
a shape parameter H equal to one of three values: 0.2, 
0.4 or 0.6. The resulting means and variances are seen 
in Fig. la to collapse onto three parabolas corresponding 
to the three classes defined by the value of S. Limiting 
distributions are shown in Figs, l(b-d); while they are 
all very close to lognormal, the different H’s lead to a 
range of shapes, from nearly Gaussian (lb; S = 0.2), to 
skewed with exponential-like tail (Ic; S = 0.4) and finally 
to highly skewed (Id; S = 0.6). Within each class the 
distributions from all simulations collapse after rescaling 
onto a single curve. Thus, both properties of distribu¬ 
tion collapse and the quadratic dependence of variance 
on mean hold once S is fixed, even though the condi¬ 
tions of the exactly solvable model are relaxed. 

In addition, the assumption of symmetric division can 
be relaxed as long as the average accumulation compen¬ 
sates for the loss at division. The assumption of stable 
proteins can also be relaxed to include first-order protein 
degradation; this would require an additional parameter 
and would modify only the accumulation exponent. 


COMPARISON WITH DATA 

To test the assumption of negative correlation we plot¬ 
ted experimental values of AlnA^ = InA^+i — \nNi 
vs. In Ai, as measured for bacteria, in Fig. 2a. The data 
points were collected from six individual trajectories nor¬ 
malized to unit average (data from [8]). In agreement 
with Eq. Q, the data are consistent with random scatter 
about an overall linear dependence, with negative slope 




< N) Scaled N 




Scaled N Scaled N 

FIG. 1: Model simulations. The stochastic process 
described by Eq. (H was simulated over 20,000 
generations for each run. Values for a and N were 
chosen uniformly at random in the range [0,1] and 
[0.25,0.75], respectively. The division ratio fi was drawn 
from a Gaussian distribution with mean 1/2 and 
standard deviation 0.1, and from a 
gamma distribution whose variance was adjusted to a 
to obtain one of 3 values of the shape parameter E. (a) 
Variance vs. mean of 18 simulations for each shape 
parameter show a collapse on 3 corresponding 
parabolas, (b-d) Limiting distributions of three 
simulations for each class are all well fit by lognormal 
but have different shape parameters and so span a 
range of shapes, from approximately Gaussian (b), to 
an exponential-like tail (c), to a highly skewed 
distribution (d). In each class the distributions collapse 
onto one another to high accuracy. 


determining a to be approximately 0.37T0.04. Using this 
value the residual accumulation exponents can be ex¬ 
tracted from the data using Eq. 0 and measured values 
of Ni. The approximately Gaussian distribution of these 
exponents is shown in Eig. 2b, and their independence 
between consecutive generations is evident from Eig. 2c. 

The parameter determining the distribution shape in 
our model is E = a j\j2a — . Estimating a and a from 

Eigs. 2a and 2b respectively, we find E 0.4 ± 0.02. 
Eig. 2d shows the lognormal distribution of Eq. 0 cor¬ 
responding to this parameter in rescaled units (black 
line), together with data from a large bacterial popu¬ 
lation (grey circles), and single cell trajectories (black 
squares and red stars) that exhibits the measured univer¬ 
sal distribution shape over several decades of probability. 
We note that this is not a fit, but a model prediction 
with no adjustable parameters: the single parameter de- 
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Scaled Fluorescence 


FIG. 2: Comparison with data, (a) 

AlnA^^ = In — In plotted vs. In in units of N. 
Solid line is ^ = —0.37x. (b) Accumulation exponents in 
consecutive generations are approximately normally 
distributed with average ln2 (subtracted out in the 
figure), and (c) are independent between generations. 
Solid line is ^ = 0.0075x — 0.00036. (d) Estimating the 
universality class parameter E for these data from (a) 
and (b), the distribution shape is predicted by Eq. 
(black line) and compared to data. Grey circles: large 
population snapshot; black squares: protein trajectories 
of individual trapped bacteria; red stars: sampled 
points at the end of each cell cycle. Data from mi]. 

termining the distribution shape is computed separately 
using the single-cell dynamic measurements. 

DISCUSSION 

Motivated by experiments that found universal protein 
distributions under various conditions in yeast and bacte¬ 
ria, and by single-cell measurements of protein accumula¬ 
tion and division in bacteria across multiple generations, 
we have presented a model based on the premise that 
the combined processes of growth, division, and feedback 
set the distribution shape. With fixed division ratio and 
Gaussian randomness the model is exactly solvable. The 
solution identifies a single parameter E (Eq. defining 
the distribution shape: it quantifies the balance between 
growth of variance and feedback that stabilizes protein 
numbers. With E fixed, a rescaling by mean and stan¬ 
dard deviation collapses these distributions onto a single 
curve, and displays a quadratic relation between variance 
and mean. 

Thus, our model predicts that populations in the same 
class — i.e., which share the same shape parameter — 
exhibit similar set-point balance between the opposing 


forces in the dynamics of their protein content across 
time, i.e. between the variance of accumulation expo¬ 
nents (cr) that drive the process to diverge, and the feed¬ 
back parameter {a) that prevents divergence. Therefore, 
if the variance of the exponents changes, the feed¬ 
back parameter a should change in a correlated manner. 
To test this prediction, single-cell dynamical trajectories 
need to be measured over a variety of conditions that 
span these parameters. Another possibility consistent 
with our model is that both a and a are fixed. At the mo¬ 
ment, experimental perturbations — for example, chang¬ 
ing medium or temperature — can change the mean, for 
example by modifying the mean cell cycle time; but their 
effect on the variance of exponents is unknown [10]. 

Our approach shares some features with previous the¬ 
oretical work but differs in other respects. Earlier work 
focused on protein accumulation and division [T8H22] , or 
protein accumulation and continuous dissipation |23l [24| . 
The recent data on protein content over multiple gener¬ 
ations |8| shows that, due to the exponential nature of 
protein accumulation, division or dissipation alone can¬ 
not stabilize copy numbers, and reveal a correlation be¬ 
tween variables across generations. 

The classes of proteins of interest are those consisting 
of high-copy-number molecules, characterized by expo¬ 
nential accumulation between successive cell divisions. 
The exponential accumulation of protein during a cell 
cycle suggests that protein production reflects a coher¬ 
ent integration of many correlated processes in the cell. 
Exponential growth of the cell size between divisions, 
as well as negative correlation analogous to the one re¬ 
ported here, were measured in several recent experi¬ 
ments [inillll[l3l[25| . Moreover, results on trapped bac¬ 
teria show explicitly that the exponents of cell size growth 
and protein accumulation are strongly correlated on a 
cycle-by-cycle basis [8]. This suggests a picture where 
highly expressed proteins that are strongly coupled to cell 
metabolism are components of multi-dimensional pheno¬ 
types that covary between individual cells. This view is 
supported by a model recently proposed to explain expo¬ 
nential biomass growth as resulting from an interacting 
network of reactions IMHHl- Eurthermore, our model 
is mathematically related to a recently proposed model 
of cell size regulation [12] |28], which finds under simi¬ 
lar assumptions a lognormal distribution with the same 
compound parameter governing its shape. Eor highly 
expressed proteins, this may be expected since protein 
production and cell growth are tightly coupled [8] . How¬ 
ever, there are also important differences between the two 
models, which we address in detail in the Appendix. 

Our model addresses directly the universal behavior of 
bacterial protein distribution among different biological 
realizations, including expression regulation mechanisms, 
growth conditions, and types of microorganism. Its in¬ 
gredients are independent of specific biological mecha¬ 
nisms and rely on those general aspects of cellular events 
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— exponential protein accumulation, division and feed¬ 
back — that are likely to be common to all dividing cell 
populations. This marks a significant departure from the 
main current line of research on protein number varia¬ 
tion, which investigates synthetically produced proteins 
while experimentally isolating the contribution of specific 
microscopic mechanisms [29H34] . 

In particular, we have observed that feedback must be 
present, because without it the mean and variance neces¬ 
sarily drift to larger values as time increases. Moreover, 
regardless of the specific processes leading to feedback 
(which may differ for different protein types and organ¬ 
isms) , the mathematical form of the feedback in a growth- 
and-division model must be power law to be consistent 
with universality. 
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APPENDIX: UNIVERSAL PROTEIN 
DISTRIBUTIONS AND CELL GROWTH 

In this appendix, we detail the differences between our 
model and a mathematically similar model of cell size 
regulation recently proposed by Amir [121 [28]. However, 
before describing those differences, we first discuss some 
nontrivial consequences following from the similarity be¬ 
tween the models. The fact that similar mathematical de¬ 
scriptions, albeit with different biological interpretations, 
may capture the dynamics of two distinct phenotypes is a 
potentially significant mathematical unification resulting 
from a strong coupling among disparate biological pro¬ 
cesses. As noted above, cell size and protein copy number 
are two separate, but strongly correlated, phenotypes of 
the cell. Which one controls the other, or whether both 
are regulated together, is not known at this time. One 
interesting implication of the work presented here is that 
not only cell size, but also the total content of highly ex¬ 
pressed proteins, is under the control of what appears to 
be a global cellular feedback, supporting the viewpoint 
that protein copy number variation is a global variable. 
This marks a significant departure from the main current 


line of research on protein number variation. 

We now turn to a discussion of some important differ¬ 
ences in the interpretation and consequences of the two 
models. First, the requirement that the average must 
scale out of the distribution shape in any model of univer¬ 
sal distributions necessitates mathematically the power- 
law form of the feedback. Second, the parameter F that 
governs the “universality class” of the family of distribu¬ 
tions has been identified, and its constancy leads both to 
the collapse of all curves under linear scaling and to the 
observed quadratic relation between variance and mean. 
The latter is an especially important consequence and 
has no analogue in cell size distributions. We have also 
seen numerically that the division into such classes ex¬ 
tends beyond the conditions of the analytically solvable 
case, rendering this result robust with respect to a wide 
class of distributions of the underlying random variables. 

Perhaps most importantly, our model makes specific 
predictions on the constrained changes allowed in pro¬ 
tein trace parameters under varying conditions. Similar 
predictions cannot currently be made for cell size distri¬ 
butions. 

Both our model and that of [laiiH] are also easily mod¬ 
ified to handle asymmetric division, as in yeast. However, 
until data are available that relate temporal to popula¬ 
tion statistics, it remains to be seen to what extent the 
dynamics of proteins across generations in yeast can be 
described by the approach outlined in the main text. 

A final important difference between the approach de¬ 
scribed in this paper and that in [28] concerns the na¬ 
ture of the feedback itself. Analysis of E. coli data led 
to the conclusion that cell size feedback is characterized 
by a = 1/2 [28], corresponding (in leading order) to the 
proposal that the feedback arises from constant addition 
of volume over the cell cycle. In contrast, a different 
mechanism(s) may apply for copy numbers, and a can in 
principle vary among protein types: our current results 
are consistent with various values of a with an average 
of (a ~ 0.37. However, at this stage there is no direct 
evidence that can determine which phenotype (cell size, 
protein copy number, etc.) controls the division point of 
the cell and thus the feedback mechanism that controls 
it. In principle, it could also be a cellular state that is 
defined by several phenotypes simultaneously. 

We show this difference explicitly in the figure below. 
Fig. louses cell size data from [8] to compute the feedback 
parameter a, in manner similar to that used in Fig. 2a, 
for the cell size phenotype. This results in a ~ 0.5, in 
agreement with [28], but different from the value a ~ 0.37 
shown in Fig. 2a. 
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